SQL (Structured Query Language, pronounced both as S-Q-L and Sequel) is a programming language designed to interface with data held in a relational database. These are databases within which you can hold different kinds of entries and define relationships between them. For example, you can define many Star entries and many Observation entries and then define a relationship in that each Observation is of a specific Star or of multiple Stars.
This can be exceptionally useful in working with large sets of data. Each entry in your database has a set of columns, which can be strings, floats, ints, or booleans (though SQLAlchemy can perform some tricks to expand this list). The power of using databases comes from the ability to query your data according to these columns. For example, the Star entry can have an RA column and a DEC column, and so, you could query for a list of Stars in a particular part of the sky.
But this is thinking too small. With a relational database, you could get a list of all the stars in your database with at least three observations with good seeing above a particular magnitude in only a few lines of code. And now, with SQLAlchemy, you can do it all in Python.
SQLAlchemy is a Python module that acts as an interface between Python code and a SQL database. Effectively, you can write object classes in Python whose instances are rows in a particular table in your database. This lets you manage your database straight from Python, giving you access to a full programming language of possibilities. I'll offer some practical examples in the Practical Examples notebook, but here's an example I've done in my research: you can generate plots directly from the database using an arbitrary query, and can use the relationships between database entries to make interactive plots.
For more details on SQLAlchemy, check out their website http://www.sqlalchemy.org. In particular, you should walk through their tutorial (of which I'll be giving a taste here).
A final note: there are other tools to do this in Python, and another that I've used is Django. I often find that Django has a much better user interface than SQLAlchemy; however, it isn't really designed to be used just for its databasing. Django is a web framework, and extracting the database part of the framework looks like it would be a bit tricky. If your goal is to write a database for a website, though, I'd definitely recommend Django over SQLAlchemy.
Instructions: Sentences in bold are instructions. These notebooks will not work unless the instructions are completed. Another note: if you find that SQLAlchemy is complaining that you are modifying an existing table, you should restart the notebook (the circular arrow in the toolbar above) and execute all the cells down from the top. This is a quirk of SQLAlchemy, and one that hopefully will make sense with some experience.
To get started using SQLAlchemy, we have to create an Engine. This is the interface between SQLAlchemy and the actual database, and this step points SQLAlchemy to the database of interest.
For the moment, to avoid polluting your computer, we'll create a temporary database in memory as shown below. To create a real database, the first argument to create_engine should be 'dbtype://path/file_name'. The echo argument is for debugging purposes. We'll turn it on now for you to see a bit of what's going on behind the scenes, but feel free not to set it.
In [2]:
from sqlalchemy import create_engine
engine = create_engine ('sqlite:///:memory:', echo = True)
Now that we have the interface to our database, we need to define a table to put into the database. SQLAlchemy allows you two ways to do this: the declarative way and the classical way. For the most part, the declarative method is more "Pythonic," so I'm going to focus on that one. Let's make a Star entry. To do this, we need to load the declarative method.
In [3]:
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
This generates a base class from which all of our entries will inherit. Effectively, this base class sets up all the methods that our entry object will need to work as part of a database. Try adding ra and dec float columns.
In [4]:
from sqlalchemy import Column, Integer, String, Float
class Star (Base):
__tablename__ = 'stars' # This is the name of the table in the database. Must be unique
# After the table's name, we add the desired columns. These contain the queryable information about
# the current entry.
id = Column (Integer, primary_key = True) # This is the primary key of the entry.
# The primary key is the unique identifier of the entry. It's usually best to leave it as an int.
name = Column (String, unique = True) # The unique argument enforces uniqueness for the column
def __repr__ (self):
# A string representation of the object
# We can call the various columns with self.column_name, as with any normal class variable
return "<Star Object %s at (RA=%f, DEC=%f)>" % (self.name, self.ra, self.dec)
# CheckConstraint('col2 > col3 + 5', name='check1')
Once the class has been declared, we need to create it in the database, i.e. we need to connect the Engine class to our new Star class. We can do this through our Base class, which will add any subclasses it can find as tables in our database. If you set echo = True in the creation of your engine, this will be a bit verbose.
In [8]:
Base.metadata.create_all (engine)
Congratulations! You've made a database! Of course, making it and using it are two different things. To interface with the database, you need a Session object. This object serves as your interface to SQLAlchemy. With it, you can create, delete, modify, and query entries. The syntax follows:
In [9]:
from sqlalchemy.orm import sessionmaker
Session = sessionmaker (bind = engine)
Note that Session is a class, not a class instance. When you want an instance of a session, you will need to do so explicitly (as I'll show in a moment).
To give an idea of how to use this, all of the above code would exist as a Python module that defines your database and gets it ready for use. When you need to query the database, you import this module and run, for example:
In [7]:
session = Session () # Start the SQLAlchemy session
newStar = Star (name = "Polaris", ra = 2.5303, dec = 89.2641) # Make a Star to put into the database
session.add (newStar) # Add newStar to the database
We've almost added a new object to the database. SQLAlchemy wants to be very careful to prevent you from making mistakes. To commit the new objects to the database, use
In [8]:
session.commit () # Commit the changes to the database
And that's it! You've successfully added Polaris to the database. Try adding a few more; these will be useful later. You can also try modifying the properties of newStar, but don't commit your changes just yet.
In [8]:
We can always check whether SQLAlchemy has anything to commit with session.new and session.dirty. Note that we don't have to add newStar into the database again, the session's already tracking it.
In [9]:
session.new
Out[9]:
In [10]:
session.dirty
Out[10]:
In [11]:
session.commit ()
To delete an object from the database, just use session.delete.
In [12]:
session.delete (newStar)
Finally, say we didn't want to commit that last command. Just use the rollback method of session.
In [13]:
session.rollback ()
That's all there is to adding and deleting entries with SQLAlchemy! But of course, we haven't actually checked that the things we've added are in the database. This brings us to our next point: querying the database.
The power of databases comes from your ability to query them. Effectively, you need to tell SQLAlchemy what to look for in the database. An interesting note here: when you call a query, this returns a Query object, not the results of that query. You'll see what I mean in a minute.
To query the database, use the query method of the session object. This takes as arguments the types of objects you'd like to query. In this case, we're querying the Star object.
In [14]:
session.query (Star)
Out[14]:
As I said, this returns a Query object. The power of this is that SQLAlchemy won't actually look through the database until it actually needs to. This can be very useful when you start to use filters, but for the moment, let's figure out how to retrieve objects from the database:
In [15]:
query = session.query (Star)
query.all () # Outputs a list of everything that matches the query
Out[15]:
In [16]:
query.order_by (Star.ra).first () # Outputs the first entry found, should be used with order_by
Out[16]:
In [21]:
query.count () # Outputs the number of matches
Out[21]:
In [23]:
query.one () # If there's only one match, output it; otherwise, raise an error
These queries can be filtered and ordered as well.
In [25]:
query.filter (Star.ra < 5).all ()
Out[25]:
In [28]:
query.order_by (Star.dec).all ()
Out[28]:
But remember, this is Python. Star.ra and Star.dec are Column objects, and the SQLAlchemy team have added algebraic operations to these objects, so you can filter and order by combinations of columns.
In [30]:
query.filter (Star.ra - Star.dec / 2 < 5).all ()
Out[30]:
The filter method returns another Query object, so you can continue to add filters and orderings.
In [37]:
query.filter (Star.ra < 5).filter (Star.dec > 0).all ()
Out[37]:
Feel free to play around. You might want to add some more entries to the database so that you can truly see what you can do. In what order do things pop out when you order by one column and then another?
In [37]:
You now have a good amount of experience with adding entries to a table that already exists and with querying that database using filters. For more information on this, check out the documentation (or just search in StackOverflow). It will generally help if you can figure out the terminology for what you're trying to do in SQL (e.g. union, subquery), as that will help narrow your search.
If you're bored of normal, non-relational databases, follow the link to the next notebook! If you want to learn how to add constraints to your objects, keep reading.
Another convenient aspect of databases is the ability to add constraints. These will effectively set certain conditions on anything added to the database and raise an error if anything that violates these constraints is added to the database. You've already seen one of these constraints, the unique constraint, which forces a particular column to be unique in the database. However, we can take this a step further. What if we want two columns to be unique together? For example, what if we want to make sure that no two stars in our database have the exact same RA and the exact same Dec? We do this by explicitly adding a constraint.
In [3]:
from sqlalchemy import UniqueConstraint
from sqlalchemy import Column, Integer, String, Float
class Star (Base):
__tablename__ = 'stars' # This is the name of the table in the database. Must be unique
# After the table's name, we add the desired columns. These contain the queryable information about
# the current entry.
id = Column (Integer, primary_key = True) # This is the primary key of the entry.
# The primary key is the unique identifier of the entry. It's usually best to leave it as an int.
name = Column (String, unique = True) # The unique argument enforces uniqueness for the column
ra = Column (Float)
dec = Column (Float)
__table_args__ = (UniqueConstraint ("ra", "dec"),)
def __repr__ (self):
# A string representation of the object
# We can call the various columns with self.column_name, as with any normal class variable
return "<Star Object %s at (RA=%f, DEC=%f)>" % (self.name, self.ra, self.dec)
Paste this new definition of Star up to the original definition and restart the notebook (each time you change the table in the database, you'll most likely want to nuke the database and bring it back from scratch; there are ways to avoid this, called migrations, but these are quite complicated and usually not worth it except in production level databases). Try adding two stars with different names but the same RA and Dec and see if you get an error.
In [ ]:
The new piece we've added is the __table_args__ member of the object. This is a special object that SQLAlchemy will read to add special commands to your database. Among other things, constraints are placed here. This member should be a tuple of all the constraints you want to add to your object. Let's look at another type of constraint, the check constraint. This makes sure that certain conditions are met and has the following notation:
In [8]:
from sqlalchemy import CheckConstraint
CheckConstraint ("ra < 24")
Out[8]:
Try adding this constraint in addition to a minimum for RA and likewise for Dec to your Star.
In [ ]: